Fault tolerance and configurability in DSM coherence protocols
نویسندگان
چکیده
With the advent of large networks and the demand to have uninterrupted service, computer systems need to be more robust and fault tolerant. There are numerous ways to implement fault tolerance and recovery. A central concept in all these methods is the requirement for replicated data for high data availability. We believe that a protocol must not only provide replication, but do so at low operation overhead. Further, the protocol must provide configurable mechanisms for varying the level of replication, so that the system may be operated at the desired overhead cost. We have developed several Distributed Shared Memory (DSM) protocols and use these with a program-driven simulation to examine the robustness, fault tolerance, and configurability of these. Our investigation compares the Write-Invalidate, Write-Invalidate with Downgrading, Write-Broadcast and several instances of the Boundary-Restricted coherence protocol class. The DSM application suite contains programs representative of various memory-access patterns and behaviors. This paper examines the performance of these protocols under different workloads and analyzes the operation costs, fault tolerance, and configurability of each.
منابع مشابه
Fault Tolerance and Configurability in DSM Coherence
tolerance. To address these aspects, the DSM coherence protocol must offer increased redundancy, decreased reliance on centralized data and control, support for servicing requests locally, and control over the degree of data availability on a per-data-unit basis. In a page-based DSM system, as assumed in this article, the unit of interest is a DSM page. Object-based systems can use the same pro...
متن کاملDesign and Analysis of Highly Availbalbe and Scalable Coherence Protocols for Distributed Shared Memory Systems Using Stochastic Modeling
Larger size networks require DSM coherence protocols which scale well. Fault-tolerance in terms of high availability is required for data access and for uninterrupted DSM service since large-scale environments have a greater number of potentially malfunctioning components. We present a new class of coherence protocols for DSM systems whose instances o er highly available access to shared data a...
متن کاملFlexible Fault Tolerance in Configurable Middleware for Embedded Systems
MicroQoSCORBA (MQC) is a middleware platform that focuses on embedded applications by providing a very fine level of configurability of its internal orthogonal components. Using this configurability, a developer can generate a customized middleware instantiation that is tailored to both the requirements and constraints of a specific embedded application and the embedded hardware. One of the key...
متن کاملFast and Low Cost Recovery Techniques for Distributed Shared Memory
The goal of this paper is to indicate how the mechanisms already available in standard Distributed Shared Memory (DSM) systems can be efficiently used to reduce the cost of fault-tolerance. It can be achieved by using DSM replication mechanism for recovery and integration of both recovery and memory coherence protocols. We analyze recently developed techniques of recovery for DSM systems which ...
متن کاملA Recoverable Distributed Shared Memory Integrating Coherence and Recoverability
Large-scale distributed systems are very attractive for the execution of parallel applications requiring a huge computing power. However, their high probability of site failure is unacceptable, especially for long time running applications. In this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM). Although most recover...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Concurrency
دوره 8 شماره
صفحات -
تاریخ انتشار 2000